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ABSTRACT 

In 1992 the Utah State Office of Education initiated 
a review of recent literature on performance assessment as a step in 
establishing the foundation of the Utah State Core Curriculum 
Performance Assessment Program. Profiles Corporation conducted the 
review of the literature and contacted educators in all 50 states 
regarding the current state of large-scale performance assessment. 
This summary of the nature, design, and use of performance 
assessments is based on the review and survey. Performance assessment 
is a response to the calls for educational change that are sweeping 
the country, and it reflects society* s need to produce creative 
problem solvers, critical thinkers, and information processors. The 
literature review and the survey make it clear that performance 
assessments are being developed by states and districts across the 
country. In general, states are taking their time in developing and 
testing the new assessments because they are mindful of the problems 
involved in performance assessment and are attaching high stakes to 
its results. Implications of the review and survey for the state 
core-curriculum program are discussed. Survey results are summarized 
in an attachment, and there is a nine-^page table of findings. 
(Contains 34 references.) (SLD) 
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REVIEW OF LITERATURE 



Introduction 

In 1992, the Utah State Office of Education initiated a review of recent litera- 
ture on performance assessment for the purpose of establishing a solid founda- 
tion on which to bmld the Utah State Core Curriculum Performance Assessment 
Program. Profiles Corporation conducted the review of the literature and con- 
tacted educators in all fifty states regarding the current state of large-scale per- 
formance assessment. This summary of Ihe nature, design, and use of 
performance assessments examines the following topics: 

• A brief historical background of assessment in the United States 

• How performance assessment responds to the changing educational needs of 
todays students 

• The impact performance assessments can have on educational practices as 
they model new methods of instruction 

• How performance assessment places evaluation in the hands of teachers and 
students 

• The ramifications of performance assessment's criterion-referenced orienta- 
tion 

• How performance assessment takes the secrecy out of testing 

• Prominent petychometric issues related to performance assessment 

• Preliminary conclusions of the literature review and the Profiles survey 

• Initial implications of the review and the Profiles study for the Utah State 
Performance Assessment 



Historical background 

Performance assessment is a response to broad, fimdamental changes that are 
sweeping over both education and the larger society. Earlier in this century, 
when the Umted States was an industrial-based society, it seemed logical for 
educators to spread the coimtry's school children across an achievement 
continutun. Traditional assessment's goal was to attain the greatest possible 
refinement in selecting the cream of the crop for higher education. The vast 
mcQority of students, represented by the great bulge in the middle of p^chome- 
tricians' bell-shaped curve and the tail below it, were tracked for jobs in factories 
and other kmds of manual labor (Hill, 1992; Stiggins, 1991). Such assessment 
honored the notion that there was a step on society's ladder for everyone, and 
that norm-referenced tests would help identify the step where each person 
should probably sit. As om- economy has shifted to an information/services base 
implemented by high technology, the achievement levels required by most jobs 
have risen dramatically and are fundamentally different in kind than previous 
levels. Those students represented in the bottom half of the curve find that 
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there is no longer necessarily a niche for fhem. Many of the lower steps have 
been removed from society's ladder. These students have become a population at 
risk. Some researchers have gone so far as to say that our educational system 
and the accompanying assessment scheme that relegated students to their posi- 
tion in society have become obsolete (Stiggins, 1991). 

As the United States faces the new challenges of the next centuxy, the issue of 
accountability in education has become paramount in the eyes of many of the 
nation's decision makers. Congress, professional organizations, and the 
National Council on Educational Standards and Ibsting want to hold districts, 
teachers, and students accoimtable for their educational outcomes. Their pur- 
pose is to ensure the quality of education in our country and to see that equal 
opportunities are available to all school children. They believe this can be 
accomplished by: 

1) setting dear, high standards that define what students should know and how 
well they should know it; 

2) setting criteria for what schools m\ist provide so that students can actually 
reach those standards; 

3) assessing all students rather than just a sample; 

4) making assessments more effective by resorting increasingly to performance- 
based measures (Wolf et al, 1992). 

The emphasis has shifted from ranking children on the basis of their basic skills 
to improving both schools' and students' performances. In an educational envi- 
ronment that has been assessed almost exclusively by norm-referenced tests, 
performance assessment can bring balance to a lopsided picture by evaluating 
broad outcomes in addition to discrete skills, quality of learning as well as 
breadth of learning, process as well as product, and divergent thinking as well 
as conformity (Finch & Dost, 1992). 



Responding to the educational needs of today's students 

Performance assessment reflects our need to produce a society of creative prob- 
lem-solvers, critical thinkers, and information processors, as opposed to memo- 
rizers of isolated facts. While basic skiUs are, hy definition, fundamental, they 
can do no more than provide a foundation for the development of more impor- 
tant skills. The nation's students must be able to apply basic skills in new and 
complex situations. Performance assessment takes testing beyond the 'Vight 
answer" mentality and the most elementary levels of thinking. It evaluates stu- 
dents' ability to organize and utilize their knowledge '^l kits" to produce 
desired outcomes. It accomplishes this by requiring them to: 1) bring a ntimber 
of skiUs to bear on complex, multi-step problems; 2) structure the problems; 3) 
integrate many separate pieces of knowledge and several thinking processes in 
one task; 4) find multiple paths and solutions; and, finally, 5) reflect on and eval- 
uate their own performance (Stiggins, 1991; Wiggins, 1992; Finch & Dost, 1992). 
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Promoting educational reform 

Performance assessment should provide appropriate models for educational 
reform. In fact, some educators have suggested that it shotild derive its evi- 
dence for validity primarily from its success in advancing educational reform 
(Hill, 1992). Rather than existing out of context with instruction, it has the 
potential to positively impact modes of teaching and learning by becoming an 
integral part of instruction itself (Finch, 1992; HUl, 1992; O'Neil, 1992). Tasks 
should be embedded in authentic contexts relevant to both the wider world and 
day-to-day instructional settings. These setthigs should also be meaningful, 
engaging, and essential to the student. There is comuderable evidence showing 
that performance assessment is positively influencing instractional practices 
and student performance in states and distiicts where it is being used (Herman, 
1992; Profiles 1992 survey). 

Assessment in tlie liands of teaciiers and students 

Assessment has traditionally been the z-ealm of specialists ia measurement and 
statistics. Performance assessment puts evaluation squarely in the hands of 
teachers. Research has shown that teachers are capable and reliable assessors 
of performance \mits; interrater reliabilities of .90 have often been achieved 
(Easton, 1991; Dunbar et al, 1992). This experience, in turn, enhances teachers' 
skills as evaluators in day-to-day classroom assessments, where 99% of all 
assessment takes place. Teachers and administrators are in a better position to 
shape the assessment process because performance assessments can respond 
readily to their achievement targets (Stiggins, 1991). Students are also drawn 
into the assessment process, particularly if they collect their work in portfolios 
and have the opportunity to help select which pieces are to remain ru the portfo- 
lios each year (^ggins, 1992; Wolf et al, 1992). In addition, performance units 
can and should require students to assess their own work by reflecting on the 
strengths and weaknesses of their problem-solving experiences. 

Criterion-referenced approach to testing 

Performance assessment's criterion-referenced orientation aims to reduce vari- 
ability between the best and worst performances by students rather than to pro- 
duce fine distinctions between them. It also demands high standards of 
performance for all students by modeling quality exit-level products. For exam- 
ple, Illinois has assessed 6th grade writing tasks by using outstanding 8th grade 
examples as the scoring standard (Wiggins, 1992). These standards should not 
be arbitrary cut scores but should reflect benchmarks rooted ia the real world. 
Performance assessment places an emphasis on outcomes by steering away from 
relative comparisons of students' work and focusing on what students can actu- 
ally do. The curve disappears; it is simply not relevant in a criterion-referenced 
context. 
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Secrecy in testing disappears 



The mysteiy and secrecy that surrounds traditional testing is not appropriate in 
performance assessment. Tasks and standards should be known to teadiers and 
students in advance (V^ggins, 1992; Diez & Moon, 1992). Teachers use these 
scoring criteria to design classroom erperiences that will prepare students for 
the assessment. Tasks are practiced in the dassroom during daily lessons, 
much as a piano piece is practiced in preparation for a recital performance. The 
goal is to produce high quality outcomes, and this can come about only by 
returning again and again to the task and refining and reevaluating one's 
efforts. 

Some psychometric issues in performance assessment 

Designers and users of perfonnance assessments and traditional assessment 
specialists have grappleol with the issues of evidence for their validity and relia- 
bility. Most agree that the old models of validity and reliability are not adequate 
to evaluate performance assessments (linn et al, 1991; Hill, 1992). But what 
needs to be expanded and what needs to be kept intact? Broad, representative 
sampUng of the curriculiun seems to be important to the extent that the perfor- 
mance assessment does not serve to narrow instructional foetus and thus sabo- 
tage its own efforts at educational reform (Herman, 1992). While a narrower 
focus may result in a more reliable test in the traditional sense, it reduces the 
evidence for content validity (Dunbar et al, 1992). Some p^chometricians cau- 
tion that inadequate sampling of the content domain also reduces the assess- 
ments' ability to give accurate estimates of individual performance levels. 
Shavelson and colleagues (1991; 1992) suggest that perhaps eight to twenty 
tasks covering ten different topic areas may be needed for a single subject area. 
Given the costs of scoring performance units, this could best be accomplished by 
having students add different units to their portfolios over the course of several 
years. Portfolios would also emphasize habitual outcomes to students and their 
parents. Most importantly, they would chart growth in both groups and individ- 
uals (^^ggms, 1992; Wolf et al, 1992). On the other hand. Hill (1992) and others 
contend that traditional models of reliability and validity should have a minor 
role in performance assessment. In the context of performance tasks, reliability 
becomes an important issue only if a lack of it negatively affects future instruc- 
tion. In a related mode, evidence for consequential validity, not content validity, 
is of primary importance in evaluating performance assessments. In other 
words, does the assessment help to accomplish the desired educational reform? 

Since students typically perform at somewhat different adiievement levels from 
one task to another in a given subject area, it is not advisable to report student 
achievement by a single evaluation for each subject area. This variation in a 
student's scores within a single content area is not a troublesome phenomenon, 
however. On the contrary, it most likely reflects desirable variation in teaching 
and evaluation methods (Dunbar et al, 1991). Similarly, subscores within a 
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single content area are regularly reported in traditional standardized tests 
because of this variation. However, an important distinction should be made 
between the two kinds of tests. In traditional tests the subscores represent a 
breakdown of skill areas, but in performance assessments it is tasks, not skills, 
that are reflected in the subscores, since a range of skills are required to accom- 
plish any single task. 

Evaluation criteria 

Scoring is more costly and difBcult for the performance assessment than for the 
standardized multiple choice test. However, scoring criteria should not be deter- 
mined by cost or ease of scoring. Rather, it is essential to create scoring criteria 
that reflect what really matters in the performance of a task (CNeU, 1992). In 
domg so, the integrity of the student's task is maintained. CarefuUy constructed 
scoring criteria coupled with adequate training for raters will lead to accurate 
scoring and high interrater consistency. In this context, the use of scoring crite- 
ria is nsit a subjective process, albeit a more complicated one than machine scor- 
ing a multiple-choice test. Wiggins (1992) suggests that two k^ questions be 
asked when creatmg a scoring system: 1) "What are the most saUent character- 
istics of each level or quality of response?'* and "mat are the errors that are 
most iustifiable for use m lowering a score?" He cautions that scoring criteria 
should make use of descriptive language rather than evaluative language such 
as "good," "excellent," and fair." QueUmalz (1991) describes desirable charac- 
teristics of scoring criteria, such as their generalizability to sunilar tasks, how 
well they fit the task and the target population, whether they communicate 
clearly to all audiences — students, teachers, parents, community — so that 
they can be imderstood and applied, and whether th^ help to guide dedsion- 
maidng about educational reform. 

Baker et al (1991) suggest these criteria for the evaluation of the performance 
exercises themselves: consequences, fairness, transfer and generalizability, cog- 
nitive complexity, content quality, content coverage, meaningfuhiess, cost and 
efficiency. Linguistic load is becoming a pressmg issue in the evaluation of per- 
formance assessments, as well. 



The literature and the Profiles survey 

The Profiles survey was sent to educators in all 50 states, requesting mforma- 
tion about their large-scale performance assessment programs. It is apparent 
from the sample of materials received by Profiles Corporation that the perfor- 
mance assessments being developed by states and districts across the country 
are, for the most part, striving to reach the ideals set forth in the literature. 
Many exercises are embedded in authentic contexts, combining a number of pro- 
cesses, encouraging critical thinking, and requiring creative performances of the 
students being evaluated. Selected samples also make dear, however, that it is 
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possible for performance assessments to duplicate the weaknesses of standard- 
ized multiple-choice tests. A constructed-response format does not automatical- 
ly raise the level of thinking required to answer a question. Many "^rformance 
assessment zeroises are simply meiisuring low-level recall that could be more 
efficiently measured in a multiple-choice format. A task that takes longer to do 
and to score than a multiple-choice question is not automatically more meaning- 
ful, relevant, authentic, or even a true performance. A performance assessment 
impacts instruction and learning positively only to the extent that it has been 
carefully and thoughtfully constructed in response to the achievement goals of 
the state or r^ool district. 

The results of the survey revealed that most states are taking their time to 
develop and field test the new assessments before attaching high stakes to indi- 
vidual and group scores. It is imperative for the success of a program of perfor- 
mance assessment that teaching and learning be given a reasonable opportunity 
to change in response to the new models of instruction. Typically, students per- 
form poorly when a performance assessment program is firot implemented. 
When standards are high (as they should be) and students are generally unprac- 
ticed in creatively utilizing a range of skills and thinking processes to perform a 
complex task, scores will initially be low. Instruction and learning does respond 
to this modeling, however, and scores should improve without having to sacrifice 
the standards of qualify. This sort of reform is precisely what performance 
assessment aims to bring about. If high stakes are attached to students' scores 
too early in the program's implementation, then future teaching and learning 
will be compromised (O'Neil, 1992). 

Performance assessment is not a panacea for the persistent problem of group 
differences that has plagued standardized multiple-choice tests since their 
inception, nor is it an easy answer to the problem of achieving equity in evalua- 
tion. There is no dear evidence as yet that differences among gender and ethnic 
groups are greater or lesser for performance measiires than for traditional mea- 
sures (Herman, 1992; Dunbar et al, 1992). However, performance assessment's 
potential to bring about positive changes in schools across gender and ethnic 
lines makes this issue one to watch closely in the next few years. 

Finally, it appears that performance assessment can be a powerful, liberating 
force guiding educational reform, or it can become merely another bxmlen for 
already burdened teachers. If performance assessment is to positively influence 
teaching and learning, teachers must be informed and involved in all stages of 
the program's development. Materials must be usable and **teacher friendly." 
O'Neil (1992) appropriately demanded that scoring criteria reflect the true 
nature of the task. However, if scoring is consistently too difficult or too expen- 
sive, then educators will eventually begin to look elsewhere for answers to their 
assessment needs, and perhaps even return to traditional testing formats. If 
this occurs, then the long-needed, long-awaited reforms may well die before they 
have a chance to become rooted in the educational system. 



PROFILES Corporation -August, 1992 -Draft 



Pag§6 



initial Implications of the literature review and the Profiles survey for 
the Utah State Core Curriculum Performance Assessment 

The design of Utah's State Performance Assessment should capitalize on the 
best of current theory and the strengths of assessments already in existence. It 
should also avoid the pitfalls and weaknesses exhibited by some assessments in 
use. The following ten general guidelines will be used in the development of the 
performance assessments: 

1) The performance assessments will, first and foremost, arise from and be 
guided by Utah's Core Curriculum. The tasks will be a representative sam- 
ple of the content areas and concepts dictated for each subject area and grade 
by the curriculum's standards and objectives. In addition, each task will 
require a number of processes in its execution. The processes will sample 
from key activities identified in the standards and objectives (for example, a 
science unit might incorporate such processes as "observe," "classify," "com- 
pare," and "infer'O. The scoring criteria will also reflect the standards and 
objectives of the Core Curriculimi. 

2) I^ks will be time-effective for students and teachers. Each assessment will 
be approximately four pages in length and will be designed to be performed 
in 45 minutes to an hour. The scoring criteria will be dear and easy to 
understand and use. 

3) Materials and packaging will be designed so that teachers can easily dupli- 
cate assessments for student use. Assessments will be printed one-sided 
with each sheet coded to simplify organization and duplication. 

4) Schools will not be required to have special manipulatives or manipulative 
"kits" in order to participate in the assessment process. Assessment materi- 
als will be limited to objects normally foimd in any typical classroom. 

5) Tasks will require students to think critically and to solve relevant problems. 

6) Assessments will be embedded in contexts that are meaningfiil and engaging 
to the students, as well as reflective of real world tasks. 

7) Materials will be sensitive to ethnic and gender differences. 

8) Standards for evaluation of the assessments will be high and demand excel- 
lent performances firom students. Th^ will reflect realistic rather than arbi- 
trary benchmarks. 

9) Assessments will be constructed so that they model instruction that enhances 
student learning. Students will be required to structure problems and then 
integrate knowledge and processes to solve them. Many tasks will have mul- 
tiple paths and solutions to encourage divergent thinking. Whenever appro- 
priate, self-evaluation will be an integral part of each assessment. 

10) The evaluation criteria will be taken into account in the design and adminis- 
tration of the field test. 
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Profiles Ck>rporation is continuing to communicate with the states and school 
districts that are conducting large-scale performance assessments. These 
assessments are evolving rapidly, and an ongoing exdiange of ideas is essential 
to their overall success in the United States. The door is open for potentially 
powerful, positive educational reform, and performance assessment is a key 
component in this movement. The Utah State Core Curriculum Performance 
Assessment Program can be a boon to students and educators in Utah. In addi- 
tion, by participating in the forefix>nt of this reform, it may help to move the 
entire coimtry's educational i^tem forward to meet the challenges of the 21st 
century. 
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SUMMARY OF SURVEY RESULTS 



Profiles Corporation contacted the State Departments of Education in the fifty 
states, including State Superintendents and Directors of Education, Academic 
lasting Supervisors, and Research Personnel. Administrators in the coimtr^s 
largest school districts were also contacted. 

Information was collected from all fifty states. The chart on the following pages 
summarizes the data from the surveys. Forty states reported having state-wide 
performance assessment programs either in place or in development. Two other 
states indicated that performance assessment plans would be made sometime in 
the next five years. Eight states reported no definite plans for a performance 
assessment program. 

The current programs range from a single subject area being tested on one grade 
level to six or more subject areas on several grade levels. By far the most com- 
mon state performance assessment being conducted is a direct writing assess- 
ment (31 states). Reading, math, language arts, science, social studies, physical 
education, and art were also listed as subject areas currently being evaluated. 

In addition to returning the surveys, many states sent descriptive materials. 
These included summaries of their programs, samples of performance tasks for 
various grade levels in different subject areas, scoring criteria for the tasks, 
reports of student progress, and public information publications describing the 
programs. 

It is significant to note that 84% of the respondents who reported having pro- 
grams in place or in development sent accompanying materials and/or expressed 
an interest in discussing their programs in greater detail. This willingness to 
share information is one indicator that performance assessment has widespread 
support and interest in the educational community. 
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Contact Person 


Kathleen Gooding, Consultant, Georgia Dept. of Education 


Margaret Brooks, Director of Testing, Dept. of Research and 
Evaluation 


Dr. Vivian McMillan, Assistant Supervisor Testing | 


Dr. Wanda M. Warner, Director of Instruction 


Dr. Selvin Chin-Chance, Academic Testing Supervisor 


Sallv Tieh Coordinator. Guidance. Assessment and 1 


Evaluation 


Carmen Chapman, Assessment Consultant 

Tom Kerins, Manager, Illinois State Board of Education 


1 Dr. Carole Perlman, Director Student Testing | 


Donna Long, Mathematics Consultant 

Sheila Ewing, English/Language Arts Consultant 


Dr. Mark Haack, Chief, Bureau of Instruction and 
Curriculum 


1 


Subject areas, grades assessed 


Writing -Gr. 3,5,11 
Writing - Gr. 8, 10 
Kindergarten evaluation 


Kindergarten evaluation 


1 


at end of primary, elementary, middle, and 
senior levels 


General competencies - Gr. 10 

Writing - Gr. 3, 6, 8, 10 

Science, Language, Math, Reading, Social 
St. - Gr. 1-12 (multi-modal 
assessment) 


Writing -Gr. 8, 11 


Writing -Gr. 3, 6,8,11 




Math 

Language Aits 


1 


Mtth-Gr.3, 4,7, 10 
Writing -Gr. 3, 7, 10 
Reading -Gr. 3,7, 10 


Status of 
Program 


developing 
in place 


in place 


1 


developing 


in place 
developing 


in place 


in place 


1 in place 


devek)ping 


1 


in place 

developing 

researching 


Conducting a 
Performance 
Assessment 
Program? 


Yes 


Yes 


No 1 


Yes 


Yes 


Yes 


Yes 


1 Yes 


Yes 


No 


Yes 


State 


Georgia 


• Atlanta 


a 

• 


• Gwinnett Co. 


Hawaii 






Illinois 


1 • Chicago 
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